fix(subagent): make the system prompt a fixed trust boundary by jkyberneees · Pull Request #2 · BackendStack21/odek

jkyberneees · 2026-06-01T18:20:23Z

Summary

Make the sub-agent system prompt a fixed trust boundary the parent agent cannot write to. All parent-supplied steering moves into the sub-agent's request, where the SAFETY rules frame it as data — not as identity-defining instructions.

The problem

delegate_tasks let the parent set a per-task system field that replaced the sub-agent's system prompt wholesale (dropping the SAFETY/anti-injection block), and buildSubagentPrompt embedded the raw, parent-supplied goal text directly into the system message. Because goal/context can carry text the parent ingested from untrusted sources (fetched pages, MCP output, files), a prompt-injection payload could redefine a sub-agent's identity or strip its safety rules.

The fix

Fixed system prompt. The sub-agent prompt is now a code-defined constant (subagentSystem). Nothing the parent supplies is ever spliced into it. Its SAFETY block is strengthened and explicitly states the request is data, not identity.
Guidance in the request. goal + guidance + context are assembled into the user message by buildSubagentRequest(). Removed buildSubagentPrompt and the taskSystem / ODEK_SYSTEM overrides for sub-agents.
system → guidance. The delegate_tasks field is renamed and re-described as how to approach the task (request-level), not a system prompt.
Untrusted fencing. When trust_level: "untrusted", the request body is wrapped in an <untrusted_input> fence — defense-in-depth alongside the existing applySubagentTrust permission clamp.

Tests

New subagent_prompt_isolation_test.go: the system prompt is unaffected by (even hostile) parent input; the request carries goal/guidance/context; untrusted tasks are fenced; trusted ones are not.
Removed the obsolete buildSubagentPrompt persona tests; updated the tool-schema test (system → guidance, asserts system is absent) and the e2e tests.
go build ./..., go vet, gofmt, and the cmd/odek suite pass.

Docs

docs/SUBAGENTS.md: replaced "Dynamic system prompts" with "System prompt & request (trust boundary)".
docs/SECURITY.md §7: documents the fixed-prompt boundary and what changed.

Behavior change / compat

The system field on delegate_tasks is removed (replaced by guidance), and ODEK_SYSTEM/config system no longer apply to sub-agents. The tool schema is regenerated each run, so there are no external consumers; the parent model is steered via the new field description. The dynamic persona auto-selection is intentionally dropped — approach is now expressed via guidance.

🤖 Generated with Claude Code

The sub-agent system prompt was parent-writable: delegate_tasks exposed a `system` field that REPLACED the prompt wholesale (dropping the SAFETY block), and buildSubagentPrompt embedded the raw, parent-supplied goal text directly into the system message. Since goal/context can carry text the parent ingested from untrusted sources (fetched pages, MCP output, files), a prompt-injection payload could redefine the sub-agent's identity or strip its anti-injection rules. Harden the boundary: - The sub-agent system prompt is now a FIXED, code-defined constant. Nothing the parent supplies is ever spliced into it. Strengthen its SAFETY block and state that the request is data, not identity. - All parent guidance moves into the user REQUEST via buildSubagentRequest() (goal + guidance + context). Remove buildSubagentPrompt and the taskSystem / ODEK_SYSTEM overrides for sub-agents. - Rename the delegate_tasks `system` field to `guidance` (how to approach the task — delivered in the request), and re-describe it. - When trust_level=untrusted, wrap the request body in an <untrusted_input> fence (defense-in-depth alongside the existing applySubagentTrust clamp). Tests: drop the obsolete buildSubagentPrompt persona tests; add subagent_prompt_isolation_test.go asserting the system prompt is unaffected by (even hostile) parent input, that the request carries goal/guidance/context, and that untrusted tasks are fenced. Update schema + e2e tests (system->guidance). Docs: rewrite SUBAGENTS.md "system prompt & request" section and update SECURITY.md §7. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jkyberneees merged commit 5429024 into main Jun 1, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(subagent): make the system prompt a fixed trust boundary#2

fix(subagent): make the system prompt a fixed trust boundary#2
jkyberneees merged 1 commit into
mainfrom
harden/subagent-prompt-isolation

jkyberneees commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jkyberneees commented Jun 1, 2026

Summary

The problem

The fix

Tests

Docs

Behavior change / compat

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant